632 research outputs found

    Certified Roundoff Error Bounds Using Semidefinite Programming.

    Get PDF
    Roundoff errors cannot be avoided when implementing numerical programs with finite precision. The ability to reason about rounding is especially important if one wants to explore a range of potential representations, for instance for FPGAs or custom hardware implementation. This problem becomes challenging when the program does not employ solely linear operations as non-linearities are inherent to many interesting computational problems in real-world applications. Existing solutions to reasoning are limited in the presence of nonlinear correlations between variables, leading to either imprecise bounds or high analysis time. Furthermore, while it is easy to implement a straightforward method such as interval arithmetic, sophisticated techniques are less straightforward to implement in a formal setting. Thus there is a need for methods which output certificates that can be formally validated inside a proof assistant. We present a framework to provide upper bounds on absolute roundoff errors. This framework is based on optimization techniques employing semidefinite programming and sums of squares certificates, which can be formally checked inside the Coq theorem prover. Our tool covers a wide range of nonlinear programs, including polynomials and transcendental operations as well as conditional statements. We illustrate the efficiency and precision of this tool on non-trivial programs coming from biology, optimization and space control. Our tool produces more precise error bounds for 37 percent of all programs and yields better performance in 73 percent of all programs

    Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

    Get PDF
    © 2015 IEEE.Concurrency errors, such as data races, make device drivers notoriously hard to develop and debug without automated tool support. We present Whoop, a new automated approach that statically analyzes drivers for data races. Whoop is empowered by symbolic pairwise lockset analysis, a novel analysis that can soundly detect all potential races in a driver. Our analysis avoids reasoning about thread interleavings and thus scales well. Exploiting the race-freedom guarantees provided by Whoop, we achieve a sound partial-order reduction that significantly accelerates Corral, an industrial-strength bug-finder for concurrent programs. Using the combination of Whoop and Corral, we analyzed 16 drivers from the Linux 4.0 kernel, achieving 1.5 - 20× speedups over standalone Corral

    Estimating the WCET of GPU-Accelerated Applications using Hybrid Analysis

    No full text

    Dynamic race detection for C++11

    Get PDF
    The intricate rules for memory ordering and synchronisation associated with the C/C++11 memory model mean that data races can be difficult to eliminate from concurrent programs. Dynamic data race analysis can pinpoint races in large and complex applications, but the state-of-the-art ThreadSanitizer (tsan) tool for C/C++ considers only sequentially consistent program executions, and does not correctly model synchronisation between C/C++11 atomic operations. We present a scalable dynamic data race analysis for C/C++11 that correctly captures C/C++11 synchronisation, and uses instrumentation to support exploration of a class of non sequentially consistent executions. We concisely define the memory model fragment captured by our instrumentation via a restricted axiomatic semantics, and show that the axiomatic semantics permits exactly those executions explored by our instrumentation. We have implemented our analysis in tsan, and evaluate its effectiveness on benchmark programs, enabling a comparison with the CDSChecker tool, and on two large and highly concurrent applications: the Firefox and Chromium web browsers. Our results show that our method can detect races that are beyond the scope of the original tsan tool, and that the overhead associated with applying our enhanced instrumentation to large applications is tolerable

    Concurrency testing using schedule bounding: an empirical study

    No full text

    Implementing and evaluating candidate-based invariant generation

    Get PDF
    The discovery of inductive invariants lies at the heart of static program verification. Presently, many automatic solutions to inductive invariant generation are inflexible, only applicable to certain classes of programs, or unpredictable. An automatic technique that circumvents these deficiencies to some extent is candidate-based invariant generation , whereby a large number of candidate invariants are guessed and then proven to be inductive or rejected using a sound program analyser. This paper describes our efforts to apply candidate-based invariant generation in GPUVerify, a static checker of programs that run on GPUs. We study a set of 383 GPU programs that contain loops, drawn from a number of open source suites and vendor SDKs. Among this set, 253 benchmarks require provision of loop invariants for verification to succeed. We describe the methodology we used to incrementally improve the invariant generation capabilities of GPUVerify to handle these benchmarks, through candidate-based invariant generation , whereby potential program invariants are speculated using cheap static analysis and subsequently either refuted or proven. We also describe a set of experiments that we used to examine the effectiveness of our rules for candidate generation, assessing rules based on their generality (the extent to which they generate candidate invariants), hit rate (the extent to which the generated candidates hold), effectiveness (the extent to which provable candidates actually help in allowing verification to succeed), and influence (the extent to which the success of one generation rule depends on candidates generated by another rule). We believe that our methodology for devising and evaluation candidate generation rules may serve as a useful framework for other researchers interested in candidate-based invariant generation. The candidates produced by GPUVerify help to verify 231 of these 253 programs. An increase in precision, however, has created sluggishness in GPUVerify because more candidates are generated and hence more time is spent on computing those which are inductive invariants. To speed up this process, we have investigated four under-approximating program analyses that aim to reject false candidates quickly and a framework whereby these analyses can run in sequence or in parallel. Across two platforms, running Windows and Linux, our results show that the best combination of these techniques running sequentially speeds up invariant generation across our benchmarks by 1 . 17 × (Windows) and 1 . 01 × (Linux), with per-benchmark best speedups of 93 . 58 × (Windows) and 48 . 34 × (Linux), and worst slowdowns of 10 . 24 × (Windows) and 43 . 31 × (Linux). We find that parallelising the strategies marginally improves overall invariant generation speedups to 1 . 27 × (Windows) and 1 . 11 × (Linux), maintains good best-case speedups of 91 . 18 × (Windows) and 44 . 60 × (Linux), and, importantly, dramatically reduces worst-case slowdowns to 3 . 15 × (Windows) and 3 . 17 × (Linux)

    The effects of FES cycling combined with virtual reality racing biofeedback on voluntary function after incomplete SCI: a pilot study

    Get PDF
    BACKGROUND: Functional Electrical Stimulation (FES) cycling can benefit health and may lead to neuroplastic changes following incomplete spinal cord injury (SCI). Our theory is that greater neurological recovery occurs when electrical stimulation of peripheral nerves is combined with voluntary effort. In this pilot study, we investigated the effects of a one-month training programme using a novel device, the iCycle, in which voluntary effort is encouraged by virtual reality biofeedback during FES cycling. METHODS: Eleven participants (C1-T12) with incomplete SCI (5 sub-acute; 6 chronic) were recruited and completed 12-sessions of iCycle training. Function was assessed before and after training using the bilateral International Standards for Neurological Classification of SCI (ISNC-SCI) motor score, Oxford power grading, Modified Ashworth Score, Spinal Cord Independence Measure, the Walking Index for Spinal Cord Injury and 10 m-walk test. Power output (PO) was measured during all training sessions. RESULTS: Two of the 6 participants with chronic injuries, and 4 of the 5 participants with sub-acute injuries, showed improvements in ISNC-SCI motor score > 8 points. Median (IQR) improvements were 3.5 (6.8) points for participants with a chronic SCI, and 8.0 (6.0) points for those with sub-acute SCI. Improvements were unrelated to other measured variables (age, time since injury, baseline ISNC-SCI motor score, baseline voluntary PO, time spent training and stimulation amplitude; p > 0.05 for all variables). Five out of 11 participants showed moderate improvements in voluntary cycling PO, which did not correlate with changes in ISNC-SCI motor score. Improvement in PO during cycling was positively correlated with baseline voluntary PO (R2 = 0.50; p  0.05). The iCycle was not suitable for participants who were too weak to generate a detectable voluntary torque or whose effort resulted in a negative torque. CONCLUSIONS: Improved ISNC-SCI motor scores in chronic participants may be attributable to the iCycle training. In sub-acute participants, early spontaneous recovery and changes due to iCycle training could not be distinguished. The iCycle is an innovative progression from existing FES cycling systems, and positive results should be verified in an adequately powered controlled trial. TRIAL REGISTRATION: ClinicalTrials.gov, NCT03834324. Registered 06 February 2019 - Retrospectively registered, https://clinicaltrials.gov/ct2/show/NCT03834324. Protocol V03, dated 06.08.2015

    Uncovering Bugs in Distributed Storage Systems during Testing (not in Production!)

    Get PDF
    Testing distributed systems is challenging due to multiple sources of nondeterminism. Conventional testing techniques, such as unit, integration and stress testing, are ineffective in preventing serious but subtle bugs from reaching production. Formal techniques, such as TLA+, can only verify high-level specifications of systems at the level of logic-based models, and fall short of checking the actual executable code. In this paper, we present a new methodology for testing distributed systems. Our approach applies advanced systematic testing techniques to thoroughly check that the executable code adheres to its high-level specifications, which significantly improves coverage of important system behaviors. Our methodology has been applied to three distributed storage systems in the Microsoft Azure cloud computing platform. In the process, numerous bugs were identified, reproduced, confirmed and fixed. These bugs required a subtle combination of concurrency and failures, making them extremely difficult to find with conventional testing techniques. An important advantage of our approach is that a bug is uncovered in a small setting and witnessed by a full system trace, which dramatically increases the productivity of debugging

    Automatic analysis of DMA races using model checking and k-induction

    No full text
    Modern multicore processors, such as the Cell Broadband Engine, achieve high performance by equipping accelerator cores with small "scratch- pad" memories. The price for increased performance is higher programming complexity - the programmer must manually orchestrate data movement using direct memory access (DMA) operations. Programming using asynchronous DMA operations is error-prone, and DMA races can lead to nondeterministic bugs which are hard to reproduce and fix. We present a method for DMA race analysis in C programs. Our method works by automatically instrumenting a program with assertions modeling the semantics of a memory flow controller. The instrumented program can then be analyzed using state-of-the-art software model checkers. We show that bounded model checking is effective for detecting DMA races in buggy programs. To enable automatic verification of the correctness of instrumented programs, we present a new formulation of k-induction geared towards software, as a proof rule operating on loops. Our techniques are implemented as a tool, Scratch, which we apply to a large set of programs supplied with the IBM Cell SDK, in which we discover a previously unknown bug. Our experimental results indicate that our k-induction method performs extremely well on this problem class. To our knowledge, this marks both the first application of k-induction to software verification, and the first example of software model checking in the context of heterogeneous multicore processors. © Springer Science+Business Media, LLC 2011

    The design and implementation of a verification technique for GPU Kernels

    Get PDF
    We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed synchronous, delayed visibility (SDV) semantics, which captures the execution of a GPU kernel by multiple groups of threads. The SDV semantics provides operational definitions for barrier divergence and for both inter- and intra-group data races. We build on the semantics to develop a method for reducing the task of verifying a massively parallel GPU kernel to that of verifying a sequential program. This completely avoids the need to reason about thread interleavings, and allows existing techniques for sequential program verification to be leveraged. We describe an efficient encoding of data race detection and propose a method for automatically inferring the loop invariants that are required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, that can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 162 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels
    • …
    corecore